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ABSTRACT 

Researchers have proposed a variety of metrics to measure 
important graph properties, for instance, in social, biologi- 
cal, and computer networks. Values for a particular graph 
metric may capture a graph's resilience to failure or its rout- 
ing efficiency. Knowledge of appropriate metric values may 
influence the engineering of future topologies, repair strate- 
gies in the face of failure, and understanding of fundamen- 
tal properties of existing networks. Unfortunately, there are 
typically no algorithms to generate graphs matching one or 
more proposed metrics and there is little understanding of 
the relationships among individual metrics or their applica- 
bility to different settings. 

We present a new, systematic approach for analyzing net- 
work topologies. We first introduce the dK '-series of proba- 
bility distributions specifying all degree correlations within 
d-sized subgraphs of a given graph G. Increasing values 
of d capture progressively more properties of G at the cost 
of more complex representation of the probability distribu- 
tion. Using this series, we can quantitatively measure the 
distance between two graphs and construct random graphs 
that accurately reproduce virtually all metrics proposed in 
the literature. The nature of the dif-series implies that it 
will also capture any future metrics that may be proposed. 
Using our approach, we construct graphs for d= 0,1,2,3 
and demonstrate that these graphs reproduce, with increas- 
ing accuracy, important properties of measured and modeled 
Internet topologies. We find that the d — 2 case is sufficient 
for most practical purposes, while d = 3 essentially recon- 
structs the Internet AS- and router-level topologies exactly. 
We hope that a systematic method to analyze and synthe- 
size topologies offers a significant improvement to the set of 
tools available to network topology and protocol researchers. 

Categories and Subject Descriptors 

C.2.1 [Network Architecture and Design]: Network 
topology; G.3 [Probability and Statistics]: Distribution 
functions, multivariate statistics, correlation and regression 
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1. INTRODUCTION 

Knowledge of network topology is crucial for understand- 
ing and predicting the performance, robustness, and scala- 
bility of network protocols and applications. Routing and 
searching in networks, robustness to random network fail- 
ures and targeted attacks, the speed of worms spreading, 
and common strategies for traffic engineering and network 
management all depend on the topological characteristics of 
a given network. 

Research involving network topology, particularly Inter- 
net topology, generally investigates the following questions: 
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generation: can we efficiently generate ensembles of 
random but "realistic" topologies by reproducing a set 
of simple graph metrics? 

simulations : how does some (new) protocol or appli- 
cation perform on a set of these "realistic" topologies? 

evolution : what are the forces driving the evolution 
(growth) of a given network? 



Figure^illustrates the methodologies used to answer these 
questions in its left, bottom, and right parts, respectively. 
Common to all of the methodologies is a set of practically- 
important graph properties used for analyzing and compar- 
ing sets of graphs at the center box of the figure. Many such 
properties have been defined and explored in the literature. 
We briefly discuss some of them in Section|5] Unfortunately, 
there are no known algorithms to construct random graphs 
with given values of most of these properties, since they 
typically characterize the global structure of the topology, 
making it difficult or impossible to algorithmically reproduce 
them. 

This paper introduces a finite set of reproducible graph 
properties, the dK -series, to describe and constrain ran- 
dom graphs in successively finer detail. In the limit, these 
properties describe any given graph completely. In our ap- 
proach, we make use of probability distributions, the dK- 
distributions, specifying node degree correlations within sub- 
graphs of size d in some given input graph. We call dK- 
graphs the sets of graphs constrained by given values of dK- 
distributions. Producing a family of OisT-graphs for a given 
input graph requires reproducing only the average node de- 
gree of the original graph, while producing a family of 1K- 
graphs requires reproducing the original graph's node degree 
distribution, the lA'-distribution. 2_R"-graphs reproduce the 
joint degree distribution, the 2AT-distribution, of the orig- 
inal graph — the probability that a randomly selected link 
connects nodes of degrees k and k' . 3if-graphs consider 
interconnectivity among triples of nodes, and so forth. Gen- 
erally, the set of (d + l).ff-graphs is a subset of dft'-graphs. 
In other words, larger values of d further constrain the num- 
ber of possible graphs. Overall, larger values of d capture 
increasingly complex properties of the original graph. How- 
ever, generating dif-graphs for large values of d also become 
increasingly computationally complex. 

A key contribution of this paper is to define the series 
of dif-graphs and dA'-distributions and to employ them for 
generating and analyzing network topologies. Specifically, 
we develop and implement new algorithms for constructing 
2K- and 3/f-graphs — algorithms to generate OK- and 1K- 
graphs are already known. For a variety of measured and 
modeled Internet AS- and router-level topologies, we find 
that reproducing their 3if-distributions is sufficient to ac- 
curately reproduce all graph properties we have encountered 
so far. 

Our initial experiments suggest that the dK-series has 
the potential to deliver two primary benefits. First, it can 
serve as a basis for classification and unification of a vari- 
ety of graph metrics proposed in the literature. Second, it 
establishes a path towards construction of random graphs 
matching any complex graph properties, beyond the sim- 
ple per-node properties considered by existing approaches 
to network topology generation. 

2. IMPORTANT TOPOLOGY METRICS 

In this section we outline a list of graph metrics that have 
been found important in the networking literature. This 
list is not complete, but we believe it is sufficiently diverse 
and comprehensive to be used as a good indicator of graph 
similarity in subsequent sections. In addition, our primary 
concern is how accurately we can reproduce important met- 
rics. One can find statistical analysis of these metrics for 
Internet topologies in |5U] and, more recently, in |2U|. 



The spectrum of a graph is the set of eigenvalues of its 
Laplacian L. The matrix elements of C are Hij — \/{kikj^j * 
if there is a link between a fc;-degree node i and a fcj-degree 
node j; otherwise they are 0, or 1 if i — j. All the eigenvalues 
lie between and 2. Of particular importance are the small- 
est non-zero and largest eigenvalues, Ai and A„_i, where n 
is the graph size. These eigenvalues provide tight bounds 
for a number of critical network characteristics [H] including 
network resilience |29| and network performance |19|. i.e., 
the maximum traffic throughput of the network. 

The distance distribution d(x) is the number of pairs of 
nodes at a distance x, divided by the total number of pairs n 2 
(self-pairs included). This metric is a normalized version 
of expansion [29]. It is also important for evaluating the 
performance of routing algorithms 18 as well as of the speed 
with which worms spread in a network. 

Betweenness is the most commonly used measure of cen- 
trality, i.e., topological importance, both for nodes and links. 
It is a weighted sum of the number of shortest paths pass- 
ing through a given node or link. As such, it estimates the 
potential traffic load on a node or link, assuming uniformly 
distributed traffic following shortest paths. Metrics such as 
link value |29| or router utilization \T5l are directly related 
to betweenness. 

Perhaps the most widely known graph property is the node 
degree distribution P(k), which specifies the probability of 
nodes having degree k in a graph. The unexpected finding 
in |13| that degree distributions in Internet topologies closely 
follow power laws stimulated further interest in topology 
research. 

The likelihood S |19| is the sum of products of degrees 
of adjacent nodes. It is linearly related to the assortativity 
coefficient r |25| suggested as a summary statistic of node 
interconnectivity: assortative (disassortative) networks are 
those where nodes with similar (dissimilar) degrees tend to 
be tightly interconnected. They are more (less) robust to 
both random and targeted removals of nodes and links. Li 
et al. use S in |19| as a measure of graph randomness to show 
that router-level topologies are not "very random": instead, 
they are the result of sophisticated engineering design. 

Clustering C(k) is a measure of how close neighbors of 
the average fc-degree node are to forming a clique: C(k) is 
the ratio of the average number of links between the neigh- 
bors of fc-degree nodes to the maximum number of such 
links ( 2 ). If two neighbors of a node are connected, then 
these three nodes form a triangle (3-cycle). Therefore, by 
definition, C(k) is the average number of 3-cycles involv- 
ing fc-degree nodes. Bu and Towsley employ clustering 
to estimate accuracy of topology generators. More recently, 
Fraigniaud 14 finds that a wide class of searching/routing 
strategies are more efficient on strongly clustered networks. 

3. ^-SERIES AND dif- GRAPHS 

There are several problems with the graph metrics in the 
previous section. First, they derive from a wide range of 
studies, and no one has established a systematic way to de- 
termine which metrics should be used in a given scenario. 
Second, there are no known algorithms capable of construct- 
ing graphs with desired values for most of the described 
metrics, save degree distribution and more recently, cluster- 
ing |27| . Metrics such as spectrum, distance distribution, 
and betweenness characterize global graph structure, while 
known approaches to generating graphs deal only with local, 




Figure 2: The dK- and dft'-random graph hierarchy. 

The circles represent dJf-graphs, whereas their centers rep- 
resent dA'-random graphs. The cross is the n_K"-graph iso- 
morphic to a given graph G. 



per-node statistics, such as the degree distribution. Third, 
this list of metrics is incomplete. In particular, it cannot in- 
clude any future metrics that may be of interest. Identifying 
such a metric might result in finding that known synthetic 
graphs do not match this new metric's value: moving along 
the loops in Figure can thus continue forever. 

To address these problems, we focus on establishing a fi- 
nite set of mutually related properties that can form a basis 
for any topological graph study. More precisely, for any 
graph G, we wish to identify a series of graph properties 
Vd, d = 0, 1, . . ., satisfying the following requirements: 

1. constructibility: we can construct graphs having these 
properties; 

2. inclusion: any property Vd subsumes all properties Vi 
with i = 0, . . . , d — 1: that is, a graph having prop- 
erty Vd is guaranteed to also have all properties Vi 
for i < d; 

3. convergence: as d increases, the set of graphs having 
property Vd "converges" to G: that is, there exists 
a value of index d — D such that all graphs having 
property Vd are isomorphic to G. 

In the rest of this section, we establish our construction of 
the properties Vd, which we will call the dK-series. We be- 
gin with the observation that the most basic properties of a 
network topology characterize its connectivity. The coarsest 
connectivity property is the average node degree k = 2m/ n, 
where n — \V\ and m — \E\ are the numbers of nodes and 
links in a given graph G(V, E). Therefore, the first prop- 
erty Vo in our dA"-series Vd is that the graph's average de- 
gree k has the same value as in the given graph G. In Fig- 
ure [21 we schematically depict the set of all graphs having 
property Vo as 0-ftT-graphs, defining the largest circle. Gen- 
eralizing, we adopt the term dK -graphs to represent the set 
of all graphs having property Vd- 

The Vo property tells us the average number of links per 
node, but it does not tell us the distribution of degrees 
across nodes. In particular, we do not know the number of 
nodes n(k) of each degree k in the graph. We define property 
Vi to capture this information: V\ is therefore the property 



that the graph's node degree distribution P(k) = n(fc)/n 1 
has the same form as in the given graph G. It is conve- 
nient to call P(k) the IK -distribution. Vi implies at least 
as much information about the network as Vo, but not vice 
versa: given P(k), we find k — Y kP(k). Vi provides more 
information than Vo, and it is therefore a more restrictive 
metric: the set of lif-graphs is a subset of the set of 0K- 
graphs. Figure [5] illustrates this inclusive relationship by 
drawing the set of lJ^-graphs inside the set of OA'-graphs. 

Continuing to d = 2, we note that the degree distri- 
bution constrains the number of nodes of each degree in 
the network, but it does not describe the interconnectiv- 
ity of nodes with given degrees. That is, it does not pro- 
vide any information on the total number m(k, k') of links 
between nodes of degree k and k' . We define the third 
property Vi in our series as the property that the graph's 
joint degree distribution (JDD) has the same form as in 
the given graph G. The JDD, or the 2K -distribution, is 
P{ki,k2) =m(fci,fe)(i(^ii^)/(2m), where /i(fci,fc2) is 2 if 
ki = k'2 and 1 otherwise. The JDD describes degree corre- 
lations for pairs of connected nodes. Given P(k\,k2), we 
can calculate P(k) = (k/k) ^')> but not vice versa. 

Consequently, the set of 27f-graphs is a subset of the 1K- 
graphs. Therefore, Figure depicts the smaller 2AT-graph 
circle inside IK. 

We can continue to increase the amount of connectivity in- 
formation by considering degree correlations among greater 
numbers of connected nodes. To move beyond 2K, we must 
begin to distinguish the various geometries that are possi- 
ble in interconnecting d nodes. To introduce Vz, we require 
the following two components: 1) wedges: chains of 3 nodes 
connected by 2 edges, called the P/\{k\,k2,k^) component; 
and 2) triangles: cliques of 3 nodes, called the P&(k\, ki, ks) 
component: 




Wedges: 



Triangles: 

P*(k,,k 2 ,k 3 ) 



As the two geometries occur with different frequencies among 
nodes having different degrees, we require a separate proba- 
bility distribution for each configuration. We call these two 
components taken together the 3K -distribution. 
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For Va, we need the above six distributions: where instead 
of indices A, A we use for d = 3, we have all non-isomorphic 
graphs of size 4 numbered by 1, . . . , 6. We note that the 



Sacrificing a certain amount of rigor, we interchangeably 
use the enumeration of nodes having some property in a 
given graph, e.g., n(k)/n, with the probability that a node 
has this property in a graph ensemble, e.g., P(k). The two 
become identical when n — > oo; see for further details. 




order of fc-arguments generally matters, although we can 
permute any pair of arguments corresponding to pairs of 
nodes whose swapping leaves the graph isomorphic. For 
example: P A (k 1 ,k 2 ,k 3 ) =fi P A {k 2 ,k 1 ,k 3 ) / P A (fci, fc 3 , fc 2 ), 
but P A (ki, k 2 ,k 3 ) = P A (k 3 , k 2 , fci). 

In the following figure, we illustrate properties Vd, d = 
0, . . . , 4, calculated for a given graph G of size 4, where 
for simplicity, values of all distributions P are the total 
numbers of corresponding subgraphs, i.e., P(2, 3) = 2 means 
that G contains 2 edges between 2- and 3-degree nodes. 
0K:J=2 

1K: P(1)=1, P(2)=2, P(3)=1 
2K: P(1,3)=1 r P(2,2)=1, P(2,3)=2 
3K: P*(1,3,2)=2, P h (2,2,3)=1 
4K: P 4 (3,2,1,2)=1 

Generalizing, we define the dA-distributions to be degree 
correlations within non-isomorphic simple connected subgra- 
phs of size d and the dA-series Vd to be the series of prop- 
erties constraining the graph's dK -distribution to the same 
form as in a given graph G. In other words, Vd tells us how 
groups of d- nodes with degrees ki,...,kd interconnect. In 
the 'dA' acronym, 'A' represents the standard notation for 
node degrees, while 'd' refers to the number of degree ar- 
guments k of the dA-distributions P(ki, . . . , kd) and to the 
upper bound of the distance between nodes with specified 
degree correlations. Moving from Vd to Vd+i in describing 
a given graph G is somewhat similar to including the ad- 
ditional d+ l'th term of the Fourier (time) or Taylor series 
representing a given function F. In both cases, we describe 
wider "neighborhoods" in G or F to achieve a more accurate 
representation of the original structure. 

The dA-series definition satisfies the inclusion and con- 
vergence requirements described above. Indeed, the inclu- 
sion requirement is satisfied because any graph of size d is a 
subgraph of some graph of size d + 1 . Convergence follows 
from the observation that in the limit of d = n, the set of 
nA-graphs contains only one element: G itself. As a conse- 
quence of the convergence property, any topology metric we 
can define on G will eventually be captured by dA-graphs 
with a sufficiently large d. 

Hereafter, our main concerns with the dA-series become: 
1) how well we can satisfy our first requirement of con- 
structibility and 2) how fast the dA-series converges toward 
the original graph. We address these two concerns in Sec- 
tions [I] and 

The reason for the second concern is that the number of 
probability distributions required to fully specify the dA- 
distribution grows quickly with d: see |28| for the number of 
non-isomorphic simple connected graphs of size d. Relative 
to the existing work on topology generators typically limited 
to d = 1 [Tl 1221 13*2") . we introduce and implement algorithms 
for graph construction for d = 2 and d = 3. We present 
these algorithms in Section[I]and then show in Section["""|that 
the dA-series converges quickly: 2A-graphs are sufficient for 
most practical purposes for the graphs we consider, while 
3A'-graphs are essentially identical to observed and modeled 
Internet topologies. 

To motivate our ability to capture increasingly complex 
graph properties by increasing d, we present visualizations 
of dA-graphs generated using the dA-randomizing approach 
we will discuss in Section 14.1.41 Figure ["""| depicts random 
OA-, 1A-, 2A- and 3A-graphs matching the corresponding 



distributions of the HOT graph, a representative router-level 
topology from |19|. This topology is particularly interesting, 
because, to date reproducing router-level topologies using 
only degree distributions has proven difficult However, 
a visual inspection of our generated topologies shows good 
convergence properties of the dA-series: while the OA-graph 
and lA-graph have little resemblance with the HOT topol- 
ogy, the 2A-graph is much closer than the previous ones and 
the 3A graph is almost identical to the original. Although 
the visual inspection is encouraging, we defer more careful 
comparisons to Section"""!] 

4. CONSTRUCTING d#-GRAPHS 

There are several approaches for constructing dA-graphs 
for d = and d = 1. We extended a number of these algo- 
rithms to work for higher values of d. In Section 14.11 we 
describe these approaches, their practical utility, and our 
new algorithms for d > 1. In Sections 14.21 and 14.31 we in- 
troduce dK-random graphs and dK-space explorations. We 
use the latter to determine the lowest values of d such that 
dA-graphs approximate a given topology with the required 
degree of accuracy. 

4.1 dA-graph-constructing algorithms 

We classify existing approaches to constructing OA- and 
lA'-graphs into the following categories: stochastic, pseu- 
dograph, matching, and two types of rewiring: randomizing 
and targeting. We attempted to extend each of these tech- 
niques to general dA-graph construction. In this section, we 
qualitatively discuss the relative merits of each of these ap- 
proaches before presenting a more quantitative comparison 
in Section""""" 

4.1.1 Stochastic 

The simplest and most convenient for theoretical analysis 
is the stochastic approach. For OA, reproducing an n-sized 
graph with a given expected average degree k involves con- 
necting every pair of n nodes with probability pok = k/n. 
This construction forms the classical (Erdos-Renyi) random 
graphs Q n , P |12|. Recent efforts have extended this stochas- 
tic approach to 1A |7j and 2A [3]""""|. In these cases, one first 
labels all nodes i with their expected degrees qi drawn from 
the distribution P(k) and then connects pairs of nodes 
with probabilities piif(<?;,qj) = qiqj/inq) or p 2 K(qi,qj) = 
(q/n)P(qi, qj) / (P(qi)P(qj)) reproducing the expected val- 
ues of 1A- or 2A-distributions, respectively. 

In theory, we could generalize this approach for any d 
in two stages: 1) extraction: given a graph G, calculate 
the frequencies of all (including disconnected) d-sized sub- 
graphs in G, and 2) construction: prepare an n-sized set of 
Qi-labeled nodes and connect their d-sized subsets into dif- 
ferent subgraphs with (conditional) probabilities based on 
the calculated frequencies. In practice, we find the stochas- 
tic approach performs poorly even for 1A because of high 
statistical variance. For example, many nodes with expected 
degree 1 wind up with degree after the construction phase, 
resulting in many tiny connected components. 

4.1.2 Pseudo < graph 

The pseudograph (also known as configuration) approach 
is probably the most popular and widely used class of graph- 
generating algorithms. In its original form [TI I24I . it applies 
only to the 1A case. Relative to the stochastic approach, 



(a) OA"-graph 



(b) l/S'-graph 



(c) 27^-graph 




(d) 3-K"-graph 

Figure 3: Picturizations of di^-graphs and the origii 

it reproduces a given degree distribution exactly, but does 
not necessarily construct simple graphs. That is, it may 
construct graphs with both ends of an edge connected to 
the same node (self-loops) and with multiple edges between 
the same pair of nodes (loops). 

It operates as follows. Given the number of nodes, n(k), 
of degree k, n = J^feSi* n (k), first prepare n(k) nodes with 
k stubs attached to each node, k = 1, . . . , fc max , and then 
randomly choose pairs of stubs and connect them to form 
edges. To obtain a simple connected graph, remove all loops 
and extract the largest connected component. 

We extended this algorithm to 2K as follows. Given the 
number m(ki, /C2) of edges between ki- and fc2-degree nodes, 
m — X]fc™fe2=i m (ki,k2), we first prepare lists of m(ki,k2) 
disconnected edges and label the both ends of each edge 
by ki and fe, ki,k2 = l,...,fc max . Next, for each k, k — 
1, . . . , fc max , we create a list of all edge-ends labeled with k. 
From this list, we randomly select groups of k edge-ends to 
form the fc-degree nodes in the final graph. 

The pseudograph algorithm works well for d — 2. Unfor- 
tunately, we could not easily generalize it for d > 2 because 
starting at d — 3, d-sized subgraphs overlap over edges. Such 
overlapping introduces a series of topological constraints and 
non-local dependencies among different subgraphs, and we 
could not find a simple technique to preserve these combi- 
natorial constraints during the construction phase. 

4.1.3 Matching 
The matching approach differs from the pseudograph ap- 



(e) original HOT graph 
HOT graph illustrating the convergence of dK-series. 

proach in avoiding loops during the construction phase. In 
the IK case, the algorithm works exactly as its pseudograph 
counterpart but skips pairs of stubs that form loops if con- 
nected. We extend the matching approach to 2K in a man- 
ner similar to our 2K pseudograph approach. 

Unfortunately, loop avoidance suffers from various forms 
of deadlock for both IK and 2K. In both cases, the algo- 
rithms can end up in incomplete configurations when not all 
edges are formed, and the graph cannot be completed be- 
cause there are no suitable stub pairs remaining that can be 
connected without forming loops. We devised several tech- 
niques to deal with these problems. With these additional 
techniques, we obtained good results for 2K graphs. As 
in the pseudograph case however, we could not generalize 
matching for d > 2 for essentially the same reasons related 
to subgraphs' overlapping and non-locality. 

4.1.4 Rewiring 

The rewiring approaches are generalizable to any d and 
work well in practice. They involve dif-preserving rewiring 
as illustrated in Figure [I] The main idea is to rewire ran- 
dom (pairs of) edges preserving an existing form of the dK- 
distribution. For d = 0, we rewire a random edge to a ran- 
dom pair of nodes, thus preserving k. For d = 1, we rewire 
two random edges that do not alter P(k), as shown in Fig- 
ure ^] If, in addition, there are at least two nodes of equal 
degrees adjacent to the different edges in the edge pair, then 
the same rewiring leaves P(k, k') intact. Due to the inclusion 
property of the djf-series, (d + l)A'-rewirings form a subset 
of dA"-rewirings for d > 0. For example, to preserve 3K, we 



OK 1K 

k, m • k 2 ki • • k 2 k, 

k 3 • • k 4 k 3 m • fc» k 3 



2K 



k 2 ft, •-• , -• k 2 k, •^^m k 2 
k 4 k 3 k 4 k 3 •^^-m k 2 



Figure 4: dA"-preserving rewiring for d = 0, 1, 2. 



permit a 2A'-rewiring only if it also preserves the wedge and 
triangle distributions. 

The dK -randomizing rewiring algorithm amounts to per- 
forming dA"-preserving rewirings a sufficient number of times 
for some dA"-graph. A "sufficient number" means enough 
rewirings for this process to lead to graphs that do not 
change their properties even if we subject them to additional 
rewirings. In other words, this rewiring process converges af- 
ter some number of steps, producing random graphs having 
property Vd- Even for d = 1, there are no known rigorous re- 
sults regarding how quickly this process converges, but |15| 
shows that this process is an irreducible, symmetric and ape- 
riodic Markov chain and demonstrates experimentally that 
it takes 0(m) steps to converge. 

In our experiments in Section |SJ we employ the following 
strategy applicable for any d. We first calculate the num- 
ber of possible initial dK- preserving rewirings. By "initial 
rewirings" we mean rewirings we can perform on a given 
graph G, to differentiate them from rewirings we can apply 
to graphs obtained from G after its first (and subsequent) 
rewirings. We then subtract the number of rewirings that 
leave the graph isomorphic. For example, rewiring of any 
two (1, k)- and (1, fc')-edges is a dA'-preserving rewiring, for 
any d, and more strongly, the graph before rewiring is iso- 
morphic to the graph after rewiring. We multiply this differ- 
ence by 10, and perform that number of random rewirings. 
At the end of our rewiring procedure, we explicitly verify 
that randomization is indeed complete and the process has 
converged by further increasing the number of rewirings and 
checking that all graph characteristics remain unchanged. 

One obvious problem with dA"-randomization is that it 
requires an original graph G as input to construct its dK- 
random versions. It cannot start with a description of the 
dA"-distribution to generate random dA"-graphs as is possi- 
ble with the other construction approaches discussed above. 

To address this limitation, we consider the inverse pro- 
cess of dK-targeting d' K -preserving rewtnng, also known as 
Metropolis dynamics |23| . It incorporates the following mod- 
ification to d' K- preserving rewiring: every rewiring step is 
accepted only if it moves the graph "closer" to Vd- In prac- 
tice, we can then employ targeting rewiring to construct 
dA"-graphs with high values of d by beginning with any d'K- 
graph where d! < d. Recall that we can always compute Vd' 
from Vd due to the inclusion property of the dA"-series. For 
instance, we can start with a graph having a given degree 
distribution (d! — 1) 31 , and then move it toward a di- 
graph via dA"-targeting lA"-preserving rewiring. 

The definition of "closer" above requires further expla- 
nation. We require a set of distance metrics that quan- 
titatively differentiate two graphs based on the values of 
their dif-distributions. In our experiments, we use the sum 
of squares of differences between the existing and target 



numbers of subgraphs of a given type. For example, in 
the d — 2 case, we measure the distance between the tar- 
get graph's JDD and the JDD of the current graph being 

rewired by V 2 = YlkiM [ m ™rrent(fei, fa) — "itargct (&i , fa)] 2 , 
and at each rewiring step, we accept the rewiring only if it 
decreases this distance. Note that T>2 is non-negative and 
equals zero only when reaching the target JDD. For d = 3, 
this distance T>z is a sum of squares of differences between 
the current and target numbers of wedges and triangles, and 
we can generalize it to Vd for any d. 

A potential problem with dA'-targeting d'A'-preserving 
rewiring is that it can be nonergodic, meaning that there 
might be no chain of d'A'-preserving Z^-decreasing rewirings 
leading from the initial d'A"-graph to the target dA"-graph. 
In other words, we cannot be sure beforehand that any two 
d'A"-graphs are connected by a sequence of d'A'-preserving 
and ©^-decreasing rewirings. 

To address this problem we note that the d' A"-randomizing 
and dA'-targeting d'A'-preserving rewirings are actually two 
extremes of an entire family of rewiring processes. Indeed, 
let AVd = 2?d,aftei — Vd, before be the difference of distance 
to the target dA"-distribution computed before and after 
a d'A'-preserving rewiring step. As with the usual dK- 
targeting rewiring, we accept a rewiring step if AVd < 0, 
but even if AVd 0, we also accept this step with prob- 
ability e - AT,d/T , where T > is some parameter that we 
call temperature because of the similarity of the process to 
simulated annealing. 

In the T — > limit, this probability goes to 0, and we have 
the standard dA'-targeting d'A'-preserving rewiring process. 
When T — > oo, the probability approaches 1, yielding the 
standard d'A'-randomizing rewiring process. To verify er- 
godicity, we can start with a high temperature and then 
gradually cool the system while monitoring any metric known 
to have different values in dK- and d'A"-graphs. If this met- 
ric's value forms a continuous function of the temperature, 
then our rewiring process is ergodic. Maslov et al. per- 
formed these experiments in (21] and demonstrated ergod- 
icity in the case with d' = 1 and d = 2. In our experiments 
in Section|^|where d and d' are below 4, we always obtain a 
good match for all target graph metrics. Thus, we perform 
rewiring at zero temperature without further considering er- 
godicity. If however in some future experiments one detects 
the lack of a smooth convergence of rewiring procedures, 
then one should first verify ergodicity using the methodol- 
ogy described above. 

For all the algorithms discussed in this section, we do 
not check for graph connectedness at each step of the algo- 
rithm since: 1) it is an expensive operation and 2) all result- 
ing graphs always have giant connected components (GCCs) 
with characteristics similar to the whole disconnected graphs. 

4.2 dA-random graphs 

No dA-graph-generating algorithm can quickly construct 
the set of all dA"-graphs because: 1) such sets are too large, 
especially for small d; and, less obviously, 2) all algorithms 
try to produce graphs having property Vd while remaining 
unbiased (random) with respect to all other properties. One 
can check directly that the last characteristic applies to all 
the algorithms we have discussed above. 

As a consequence, the dA"-graph construction algorithms 
result in non-uniform sampling of graphs with different val- 



Table 1: The summary of dA"-series. 



Tag 


Property 


dK- 


V d defines P d -i 


Edge existence probability in 


Maximum entropy value of (d + 1)K- 




symbol 


distribution 




stochastic constructions 


distribution in dA"-random graphs 


OK 


To 


k 




Pok = k/n 


P 0K (k) = e- h k h /k\ 


IK 


Vi 


P{k) 


k = E kp ( k ) 


PiK (91,92) = qiq 2 /(nq) 
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2K 
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P(fei,fe) 
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3K 


Vz 
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Pa(&i, k 2 , k 3 ) 


By counting edges, we get P{ki, fe) ~ Y^k {P\( fc , fe i, k 2 ) + Pa(&, fci, fc 2 )} /(fei — 1) ~ 
Efc {Pa (fcij fe, fc) + Pa (fa, fe, fc)} / (fe — 1), where we omit normalization coefficients. 
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ues of properties that are not fully defined by Vd- More 
specifically, two generated dA'-graphs having different forms 
of a d'A"-distribution with d! > d can appear as the output 
of these algorithms with drastically different probabilities. 
Some dA'-graphs have such a small probability of being con- 
structed that we can safely assume they never arise. 

For example, consider the simplest OK stochastic con- 
struction, i.e., the classical random graphs Q n , P . Using a 
probabilistic argument, one can show that the naturally- 
occurring lA'-distribution (degree distribution) in these gra- 
phs has a specific form: binomial, which is closely approxi- 
mated by the Poisson distribution: Pok (k) = e~ k k k /k\ 
The OK stochastic algorithm may produce a graph with a 
different lA'-distribution, e.g., the power-law P(k) ~ fc -7 , 
but the probability of such an outcome is extremely low. In- 
deed, suppose n ~ 10 4 , k ~ 5, and 7 ~ 2.1, so that the char- 
acteristic maximum degree is fc max ~ 2000 (we chose these 
values to reflect measured values for Internet AS topologies) . 
In this case, the probability that a 5n, P -graph contains at 
least one node with degree equal to fc max is dominated by 
1/2000! ~ 10" 6600 , and the probability that the remaining 
degrees simultaneously match those required for a power law 
is much lower. 

It is thus natural to introduce a set of graphs that corre- 
spond to the graphs most likely to be generated by di- 
graph constructing algorithms. We call such graphs the 
dK-random graphs. These graphs have property V d but 
are unbiased with respect to any other more constraining 
property. In this sense, the dA"-random graphs are the max- 
imally random or maximum- entropy dA'-graphs. Our term 
maximum entropy here has the following justification. As 
we have just seen, OA'-random graphs have the maximum- 
entropy value of the lA'-distribution since their node de- 
gree distribution is the distribution with the maximum en- 
tropy among all the distributions with a fixed average. 2 
The lA"-random graphs have the maximum-entropy value 
of the 2A"-distribution since their joint degree distribution, 
Pix(fci,fc 2 ) = P(fci)P(fa), where P{k) = kP{k)/k [TT], is 
the distribution with the maximum joint entropy (minimum 
mutual information) 3 among all the joint distributions with 
fixed marginal distributions. 4 

2 The entropy of a discrete distribution P{x) is H[P(x)\ = 
— P(x) log P{x). If the sample space is also finite, then 
among all the distributions with a fi xed average, the bino- 
mial distribution maximizes entropy |16) . 
3 The mutual information of a joint distribution P(x,y) is 
I[P(x,y)] = H[P(x)] + H[P(y)] - H[P(x,y)}, where P(x) 
and P(y) are the marginal distributions. 
4 In reality, the last statement generally applies only to 



The main point we extract from these observations is that 
in trying to construct dA'-graphs, we generally obtain graphs 
from subsets of the dK-spa.ce. We call these subsets dA"- 
random graphs and schematically depict them as centers of 
the dA"-circles in Figure [5] These graphs do have prop- 
erty Vd and, consequently, properties Vi with i < d, but 
they might not ever display property Vj with j > d since 
their j'A'-distributions have specific, maximum-entropy val- 
ues that may be different from the j A"-distribution values in 
the original graph. 

4.3 dA-space explorations 

Often we wish to analyze the topological constraints a 
given graph G appears to obey. In other cases, we are inter- 
ested in exploring the structural diversity among dA'-graphs. 
If we are attempting to determine the minimum d such that 
all dA'-graphs are similar to G, we can start with a small 
value of d, generate dA'-graphs, and measure their "dis- 
tance" from G. If the distance is too great, we can increase d 
and repeat the process. On the other hand, to explore struc- 
tural diversity among all dA'-graphs, we must generate dA'- 
graphs that are not random. These non-random dA'-graphs 
are still constrained by Vd but have extremely low proba- 
bilities of being generated by unperturbed dA'-graph con- 
structing algorithms. 

We cannot construct all dA'-graphs, so we need to use 
heuristics to generate some dA'-graphs and adjust them ac- 
cording to a distance metric that draws us closer to the types 
of dA'-graphs we seek. One such heuristic is based on the in- 
clusion feature of the dAT-series. Because all dA'-graphs have 
the same values of dA"- but not of (d + l)A"-distributions, 
we look for simple metrics fully defined by Vd+i but not 
by Vd- While identifying such metrics can be challenging 
for high d's, we can always retreat to the following two sim- 
ple extreme metrics: 

• the correlation of degrees of nodes located at distance d; 

• the concentration of d-simplices (cliques of size d + 1). 

These metrics are "extreme" in the sense that they cor- 
respond to the (d+l)-sized subgraphs with, respectively, 
the maximum (d) and minimum (1) possible diameter. We 
can then construct dA'-graphs with extreme values, e.g., the 
smallest or largest possible, for these (extreme) metrics. The 
dA'-random graphs have the values of these metrics lying 
somewhere in between the extremes. 

the class of all (not necessarily connected) pseudographs. 
Narrowing the class of graphs to simple connected graphs 
introduces topological constraints affecting the maximum- 
entropy form of the 2A"-distribution. 



If the goal is to find the smallest d that results in suffi- 
ciently constraining graphs, we can compute the difference 
between the extreme values of these metrics, as well as of 
other metrics we might consider. If this difference is too 
large, then the selected value of d is not constraining enough 
and we need to increase it. If the goal is to visit exotic lo- 
cations in a large space of dA'-graphs, then such dK-s.pa.ce 
exploration may be used to move beyond the relatively small 
circle of dA-random graphs. 

To illustrate this approach in practice, we consider 1K- 
and 2A-space explorations. For IK, the simplest metric 
defined by Vi is any scalar summary statistics of the 2K- 
distribution, such as likelihood S (cf. Section To con- 
struct graphs with the maximum value of S, we can run a 
form of targeting 1 /("-preserving rewiring that accepts each 
rewiring step only if it increases S. We can perform the op- 
posite to minimize S. This type of experiment was at the 
core of recent work that led the authors of |19| to conclude 
that d = 1 was not constraining enough for the topology 
they considered. 

To perform 2A-space explorations, we need to find simple 
scalar metrics defined by Vz. Since the 3A-distribution is 
actually two distributions, P A (fci, &2, ^3) and Pa(&i, &2, fe), 
we should have two independent scalar metrics. The second- 
order likelihood S2 is one such metric for P A (fci, ^2,^3). We 
define it as the sum of the products of degrees of nodes lo- 
cated at the ends of wedges, S2 ~ J2k 1 k 2 k 3 kiksP^kifksjks) 
Any graphs with the same P/\(k\, £12, £3) have the same 52- 
For the Pa(&i, &2, ^3) component, average clustering C ~ 
Sfcj fc 2 fc 3 kiP/\(ki,k2,ks) is an appropriate candidate. We 
note that these two metrics are also the two extreme metrics 
in the sense defined above: S2 measures the properly nor- 
malized correlation of degrees of nodes located at distance 2, 
while C describes the concentration of 2-simplices (trian- 
gles). The 2A-explorations amount then to performing the 
following two types of targeting 2A-preserving rewiring: ac- 
cept a 2A-rewiring step only if it maximizes or minimizes: 
I) &, or 2) C. 

5. EVALUATION 

In this section, we conduct a number of experiments to 
demonstrate the ability of the dA-series to capture impor- 
tant graph properties. We implemented all the dA-graph- 
constructing algorithms from Section 14. f I applied them to 
both measured and modeled Internet topologies, and calcu- 
lated all the topology metrics from Section|5|on the resulting 
graphs. 

We experimented with three measured AS-level topolo- 
gies, extracted from CAIDA's skitter traceroute '5 , Route- 
Views' BGP HJ, and RIPE's WHOIS \T7\ data for the 
month of March 2004, as well as with a synthetic router- 
level topology — the HOT graph from |19|. The qualitative 
results of our experiments are similar for the skitter and 
BGP topologies, while the WHOIS topology lies somewhere 
in-between the skitter/BGP and HOT topologies. In the 
case of skitter comprising of 9204 nodes and 28959 edges, 
we will see that its degree distribution places significant con- 
straints upon the graph generation process. Thus, even 1K- 
random graphs approximate the skitter topology reasonably 
well. The HOT topology with 939 nodes and 988 edges is 
at the opposite extreme. It is the least constrained; 1K- 
random graphs approximate it poorly, and dA-series' con- 
vergence is slowest. We thus report results only for these 



Table 2: Scalar graph metrics notations. 



Metric 


Notation 


Average degree 


k 


Assortativity coefficient 


r 


Average clustering 


C 


Average distance 


d 


Standard deviation of distance distribution 


o~d 


Second-order likelihood 


s 2 


Smallest eigenvalue of the Laplacian 


Ai 


Largest eigenvalue of the Laplacian 


An-l 



Table 3: Scalar metrics for 2A-random HOT graphs 
generated using different techniques. 



Met- 


Stoch- 


Pseu- 


Match- 


2A- 


2A- 


Orig. 


ric 


astic 


dogr. 


ing 


rand. 


targ. 


HOT 


k 


2.87 


2.19 


2.22 


2.18 


2.18 


2.10 


r 


-0.22 


-0.24 


-0.21 


-0.23 


-0.24 


-0.22 


d 


4.99 


6.25 


6.22 


6.32 


6.35 


6.81 




0.85 


0.75 


0.74 


0.70 


0.70 


0.57 



two extreme cases, skitter and HOT. 

Our results represent averages over 100 graphs generated 
with a different random seed in each case, using the notation 
in Table H 

5.1 Algorithmic Comparison 

We first compare the different graph generation algorithms 
discussed in Section [4. II All the algorithms give consistent 
results, except the stochastic approach, which suffers from 
the problems related to high statistical variance discussed 
in Section [4. 1.1 1 This conclusion immediately follows from 
Figure|^|and Tables[3]and[I]showing graph metric values for 
the different 2K and 3A algorithms described in Section fc.ll 

In our experience, we find that dA-randomizing rewiring 
is easiest to use. However, it requires the original graph as 
input. If only the target dA-distribution is available and 
if d ^ 2, we find the pseudograph algorithm most appropri- 
ate in practice. We note that our 2K version results in fewer 
pseudograph "badnesses", i.e., (self-)loops and small con- 
nected components (CCs), than PLRG fl], its commonly- 
known IK counterpart. This improvement is due to the 
additional constraints introduced by the 2K case. For ex- 
ample, if there is only one node of high degree x and one 
node of another high degree y in the original graph, then 
there can be only one link of type (x,y). Our 2K mod- 
ification of the pseudograph algorithm must consequently 
produce exactly one link between these two x- and y-degree 
nodes, whereas in the IK case, the algorithm tends to cre- 
ate many such links. Similarly, a IK generator tends to 



Table 4: Scalar metrics for 3A"-random HOT graphs 
generated using different techniques. 



Metric 


3A-randomizing 


3A-targeting 


Original 




rewiring 


rewiring 


HOT 


k 


2.10 


2.13 


2.10 


r 


-0.22 


-0.23 


-0.22 


d 


6.55 


6.79 


6.81 




0.84 


0.72 


0.57 
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(a) Clustering in skitter for differ- 
ent 2A algorithms 
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Figure 5: Comparison of 2K- and 3A-graph-constructing algorithms. 



Table 5: Numbers of possible initial dA-randomizing 
rewirings for the HOT graph. 



Table 6: Comparing scalar metrics for dA-random 
and skitter graphs. 



d 


Possible initial 


Possible initial rewirings, 




Metric 


OA 


1A 


2K 


3A 


skitter 




rewirings 


ignoring obvious isomorphisms 




k 


6.31 


6.34 


6.29 


6.29 


6.29 





435,546,699 






r 





-0.24 


-0.24 


-0.24 


-0.24 


1 


477,905 


440,355 




C 


0.001 


0.25 


0.29 


0.46 


0.46 


2 


326,409 


268,871 




d 


5.17 


3.11 


3.08 


3.09 


3.12 


3 


146 


44 






0.27 


0.4 


0.35 


0.35 


0.37 








Ai 


0.2 


0.03 


0.15 


0.1 


0.1 








A n _ i 


1.8 


1.97 


1.85 


1.9 


1.9 



produce many pairs of isolated 1-degree nodes connected to 
each another. Since the original graph does not have such 
pairs, i.e., (l,l)-links, our 2K generator, as opposed to IK, 
does not form these small 2-node CCs either. 

While the pseudograph algorithm is a good 2A-random 
graph generator, we could not generalize it for d ^ 3 (see 
Section 14.1.21 . Therefore, to generate dA-random graphs 
with d 3 when an original graph is unavailable, we use dA- 
targeting rewiring. We first bootstrap the process by con- 
structing lA-random graphs using the pseudograph algo- 
rithm and then apply 2A-targeting IK- preserving rewiring 
to obtain 2A-random graphs. To produce 3A-random graphs, 
we apply 3A-targeting 2A'-preserving rewiring to the 2A- 
random graphs obtained at the previous step. 

5.2 Topology Comparisons 

We next test the convergence of our dA-series for the skit- 
ter and HOT graphs. Since all dA-graph constructing algo- 
rithms yield consistent results, we selected the simplest one, 
the dA-randomizing rewiring from Section 14.1.41 to obtain 
dA-random graphs in this section. 

The number of possible initial dA-randomizing rewirings 
is a good preliminary indicator of the size of the dA-graph 
space. We show these numbers for the HOT graph in Ta- 
bic If we discard rewirings leading to obvious isomorphic 
graphs, cf. Section f4.1.4l then the number of possible initial 
rewirings is even smaller. 

We compare the skitter topology with its dA-random coun- 
terparts, d = 0, ... ,3, in Tableland Figure HJ We report 
all the metrics calculated for the giant connected compo- 
nent (GCCs). Minor discrepancies between values of av- 
erage degree k and r result from GCC extractions. If we 



do not extract the GCC, then ft is the same as that of the 
original graph for all d = 0, . . . , 3, and r is exactly the same 
for d > 1. 

We do not show degree distributions for brevity. How- 
ever, degree distributions match when considering the entire 
graph and are very similar for the GCCs for all d > 0. When 
d — 0, the degree distribution is binomial, as expected. 

We see that all other metrics gradually converge to those 
in the original graph as d increases. A value of d = 1 pro- 
vides a reasonably good description of the skitter topology, 
while d — 2 matches all properties except clustering. The 
3A-random graphs are identical to the original for all met- 
rics we consider, including clustering. 

We perform the 2A-space explorations described in Sec- 
tion to check if d = 2 is indeed sufficiently constraining 
for the skitter topology. We observe small variations of clus- 
tering C, second-order likelihood S2, and spectrum, shown 
in Tableland Figure|7| All other metrics do not change, so 
we do not show plots for them. We conclude that d = 2, i.e., 
the joint degree distribution provides a reasonably accurate 
description of observed AS-level topologies. 

The HOT topology is more complex than AS-level topolo- 
gies. Earlier work argues that this topology cannot be ac- 
curately modeled using degree distributions alone |19|. We 
therefore selected the HOT topology graph as a difficult case 
for our approach. 

A preliminary inspection of visualizations in Figure|3]indi- 
cates that the dA-series converges at a reasonable rate even 
for the HOT graph. The OA-random graph is a classical ran- 
dom graph and lacks high-degree nodes, as expected. The 



: OK-random 
e— 1 K-random 
2K-random 
3K-random 
Skitter 




c 
n 

10 





Distance in hops 



(a) Distance distribution 



(b) Betweenness 



(c) Clustering 



Figure 6: Comparison of cLK-random and skitter graphs. 
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Figure 7: Varying clustering in 
2A-graphs for skitter. 



Figure 8: Distance distribu- 
tion for (iA'-random and HOT 
graphs 



Figure 9: Betweenness for dK- 
random and HOT graphs 



Table 7: Scalar metrics for 2A-space explorations 
for skitter. 



Metric 


Min 


Max 


Min 


Max 
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Skit- 




C 
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s 2 


s 2 


rand. 
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k 


6.29 


6.29 


6.29 


6.29 
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6.29 
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-0.24 
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-0.24 
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0.21 
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3.10 


3.08 


3.12 


OA 
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0.37 
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0.1 
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1.85 
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0.988 
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lTf-random graph has all the high-degree nodes we desire, 
but they are crowded toward the core, a property absent in 
the HOT graph. The 2K constraints start pushing the high- 
degree nodes away to the periphery, while the lower-degree 
nodes migrate to the core, and the 2_K"-random graph be- 
gins to resemble the HOT graph. The 3-K"-random topology 
looks remarkably similar to the HOT topology. 

Of course, visual inspection of a small number of randomly 
generated graphs is insufficient to demonstrate our ability 
to capture important metrics of the HOT graph. Thus, 
we compute the different metric values for each of the dK- 
random graph and compare them with the corresponding 



value for the original HOT graph. In Tableland Figures|5] 
and[§|we see that the dK-series converges more slowly for 
HOT than for skitter. Note that we do not show cluster- 
ing plots because clustering is almost zero everywhere: the 
HOT topology has very few cycles; it is almost a tree. The 
lif-random graphs yield a poor approximation of the origi- 
nal topology, in agreement with the main argument in |19|. 
Both Figures |3] and ^indicate that starting with d — 2, low- 
but not high-degree nodes form the core: betweenness is 
approximately as high for nodes of degree ~ 10 as for high- 
degree nodes. Consequently, the 2i("-random graphs provide 
a better approximation, but not nearly as good as it was for 
skitter. 5 However, the 3isT-random graphs match the orig- 
inal HOT topology almost exactly. We thus conclude that 
the dA'-series captures the essential characteristics of even 
particularly difficult topologies, such as HOT, by sufficiently 
increasing d, in this case to 3. 

6. DISCUSSION AND FUTURE WORK 

While we feel our approach to topology analysis holds sig- 
nificant promise, a number of important avenues remain for 

5 The speed of di^-series convergence depends both on the 
structure and size of an original graph. It must converge 
faster for smaller input graphs of similar structure. However, 
here we see that the graph structure plays a more crucial 
role than its size. The dK-series converges slower for HOT 
than for skitter, even though the former graph is an order 
of magnitude smaller than the latter. 



Table 8: Comparing scalar metrics for dA-random 
and HOT graphs. 



Metric 


OA 


IK 


2K 


3A 


HOT 
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2.18 
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-0.05 
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8.48 


4.41 
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0.71 


0.84 
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Ai 


0.01 


0.034 


0.005 


0.004 


0.004 




1.989 


1.967 


1.996 


1.997 


1.997 



further investigation. First, one must determine appropri- 
ate values of d to carry out studies of interest. Our experi- 
ence to date suggests that d = 2 is sufficient to reproduce 
most metrics of interest and that d = 3 faithfully repro- 
duces all metrics we are aware of for Internet-like graphs. 
It also appears likely that d = 3 will be sufficient for self- 
organized small-worlds in general. This issue is particularly 
important because the computational complexity of produc- 
ing dA-graphs grows rapidly with d. Studies requiring large 
values of d may limit the practicality of our approach. 

In general, more complex topologies may necessitate de- 
veloping algorithms for generating dA-random graphs with 
high d's. We needed higher d to describe the HOT topology 
as accurately as the skitter topology. The intuition behind 
this observation is that the HOT router-level topology is 
"less random" because it results from targeted design and 
engineering. The skitter AS-level topology, on the other 
hand, is "more random" since there is no single point of ex- 
ternal human control over its shape and evolution. It is a 
cumulative result of local decisions made by individual ASes. 

A second important question concerns the discrete nature 
of our model. For instance, we are able to reproduce 1A- 
and 2A-distributions but it is not meaningful to consider re- 
producing 1.4A"-distributions. Consider a graph property X 
not captured by IK but successfully captured by 2K. It 
could turn out that the space of 2A-random graphs over- 
constrains the set of graphs reproducing X. That is, while 
2A-graphs do successfully reproduce X, there may be other 
graphs that also match X but are not 2A"-graphs. 

Fundamental to our approach is that we seek to repro- 
duce important characteristics of a given network topology. 
We cannot use our methodology to discover laws governing 
the evolutionary growth of a particular network. Rather, we 
are restricted to observing changes in degree correlations in 
graphs over time, and then generating graphs that match 
such degree correlations. However, the goals of reproducing 
important characteristics of a given set of graphs and dis- 
covering laws governing their evolution are complementary 
and even symmetric. 

They are complementary because the dA-series can sim- 
plify the task of validating particular evolutionary mod- 
els. Consider the case where a researcher wishes to validate 
a model for Internet evolution using historical connectiv- 
ity information. The process would likely involve starting 
with an initial graph, e.g., reflecting connectivity from five 
years ago, and generating a variety of larger graphs, e.g., 
reflecting modern-day connectivity. Of course, the resulting 
graphs will not match known modern connectivity exactly. 
Currently, validation would require showing that the graph 
matches "well enough" for all known ad hoc graph proper- 



ties. Using the dA-series however, it is sufficient to demon- 
strate that the resulting graphs are dA-random for an ap- 
propriate value of d, i.e., constrained by the dA-distribution 
of modern Internet graphs, with d = 3 known to be sufficient 
in this case. As long as the resulting graphs fall in the dA- 
random space, the nature of dA-randomness explains any 
graph metric variation from ground truth. This methodol- 
ogy also addresses the issue of defining "well enough" above: 
dA-space exploration can quantify the expected variation 
in ad hoc properties not fully specified by a particular dA- 
distribution. 

The two approaches are symmetric in that they both at- 
tempt to generate graph models that accurately capture val- 
ues of topology metrics observed in real networks. Both 
approaches have inherent tradeoffs between accuracy and 
complexity. Achieving higher accuracy with the dA-series 
requires greater numbers of statistical constraints with in- 
creasing d. The number of these constraints is upper-bound- 
ed by n d (the size of dA-distribution matrices) times the 
number of possible simple connected d-sized graphs |28|. 6 
Achieving higher accuracy with network evolution model- 
ing requires richer sets of system-specific external parame- 
ters 0. Every such parameter represents a degree of freedom 
in a model. By tuning larger sets of external parameters, 
one can more closely match observed data. It could be the 
case — which remains to be seen — that the number of pa- 
rameters needed for evolution modeling is smaller than the 
number of constraints required by the dA-series to charac- 
terize the modern Internet structure with the same degree 
of accuracy. However, with the dA-series, the same set of 
constraints applies to any networks, including social, bio- 
logical, communication, etc. With evolution modeling, one 
must develop a separate model for each specific network. 

Directions for future work all stem from the observation 
that the dA-series is actually the simplest basis for statisti- 
cal analysis of correlations in complex networks. We can in- 
corporate any kind of technological constraints into our con- 
structions. In a router-level topology, for example, there is 
some dependency between the number of interfaces a router 
has (node degree) and their average bandwidth (between- 
ness/degree ratio) |19| . In light of such observations, we can 
simply adjust our rewiring algorithms CSection l4.1.4t to not 
accept rewirings violating this dependency. In other words, 
we can always consider ensembles of dA-random graphs sub- 
ject to various forms of external constraints imposed by the 
specifics of a given network. 

Another promising avenue for future work derives from 
the observation that abstracting real networks as undirected 
graphs might lose too much detail for certain tasks. For 
example, in the AS-level topology case, the link types can 
represent business AS relationships, e.g., customer-provider 
or peering. For a router-level topology, we can label links 
with bandwidth, latency, etc., and nodes with router man- 
ufacturer, geographical information, etc. Keeping such an- 
notation information for nodes and links can also be use- 
ful for other types of networks, e.g., biological, social, etc. 
We can generalize the dA-series approach to study networks 



Although the upper bound of possible constraints increases 
rapidly, sparsity of dA-distribution matrices increases even 
faster. The result of this interplay is that the number of non- 
zero elements of dA-distributions for any given G increases 
with d first but then quickly decreases, and it is surely 1 in 
the limit of d = n, cf. the example in Section [3] 



with more sophisticated forms of annotations, in which case 
the dK-series would describe correlations among different 
types of nodes connected by different types of links within 
d-sized geometries. Given the level of constraint imposed 
by d — 2 and d = 3 for our studied graphs and recognizing 
that including annotations would introduce significant addi- 
tional constraints to the space of dif-graphs, we believe that 
274"-random annotated graphs could provide appropriate de- 
scriptions of observed networks in a variety of settings. 

In this paper, all synthetic graphs' sizes equal to a given 
graph's size. We are working on appropriate strategies of 
rescaling the d.ft'-distributions to arbitrary graph sizes. 

7. CONCLUSIONS 

Over the years, a number of important graph metrics 
have been proposed to compare how closely the structure of 
two arbitrary graphs match and to predict the behavior of 
topologies with certain metric values. Such metrics are em- 
ployed by networking researchers involved in topology con- 
struction and analysis, and by those interested in protocol 
and distributed system performance. Unfortunately, there 
is limited understanding of which metrics are appropriate 
for a given setting and, for most proposed metrics, there are 
no known algorithms for generating graphs that reproduce 
the target property. 

This paper defines a series of graph structural properties 
to both systematically characterize arbitrary graphs and to 
generate random graphs that match specified characteristics 
of the original. The dA'-distribution is a collection of distri- 
butions describing the correlations of degrees of d connected 
nodes. The properties Vd, d = 0, . . . , n, comprise the dK- 
series. A random graph is said to have property Vd if its 
dA"-distribution has the same form as in a given graph G. 
By increasing the value of d in the series, it is possible to 
capture more complex properties of G and, in the limit, 
a sufficiently large value of d yields complete information 
about G's structure. 

We find interesting tradeoffs in choosing the appropri- 
ate value of d to compare two graphs or to generate ran- 
dom graphs with property Vd- As we increase d, the set of 
randomly generated graphs having property Vd becomes in- 
creasingly constrained and the resulting graphs are increas- 
ingly likely to reproduce a variety of metrics of interest. At 
the same time, the algorithmic complexity associated with 
generating the graphs grows sharply. Thus, we present a 
methodology where practitioners choose the smallest d that 
captures essential graph characteristics for their study. For 
the graphs that we consider, including comparatively com- 
plex Internet AS- and router-level topologies, we find that 
d — 2 is sufficient for most cases and d = 3 captures all 
graph properties proposed in the literature known to us. 

In this paper, we present the first algorithms for construct- 
ing random graphs having properties Vi and Vs, and sketch 
an approach for extending the algorithms to arbitrary d. We 
are also releasing the source code for our analysis tools to 
measure an input graph's dii'-distribution and our genera- 
tor able to produce random graphs possessing properties Vd 
for d < 4. 

We hope that our methodology will provide a more rig- 
orous and consistent method of comparing topology graphs 
and enable protocol and application researchers to test sys- 
tem behavior under a suite of randomly generated yet ap- 
propriately constrained and realistic network topologies. 
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